Fixation of a Deleterious Allele under Mutation Pressure and Finite 

Selection Intensity 



Michael Assaf+'* and Mauro Mobilia^'* 

University of Illinois at Urbana-Champaign 
Loomis Laboratory of Physics, Department of Physics 
1110 West Green Street, Urbana, Illinois 61801, USA 
Phone: +1-217-333-0929 
Email: assaf@illinois.edu 

^ University of Leeds 
Department of Applied Mathematics, School of Mathematics 
Leeds LS2 9JT, United Kingdom 
Phone: -f44-(0) 11-3343-1591 
Email: M.Mobiha@leeds.ac.uk 

* Both authors contributed equally to this work. 



1 



Abstract 

The mean fixation time of a deleterious mutant allele is studied beyond the diffusion approximation. As in 
Kimura's classical work [M. Kimura, Proc. Natl. Acad. Sci. U.S.A. 77, 522 (1980)], that was motivated by the 
problem of fixation in the presence of amorphic or hypermorphic mutations, we consider a diallelic model at a 
single locus comprising a wild-type A and a mutant allele A' produced irreversibly from A at small uniform rate 
V. The relative fitnesses of the mutant homozygotes A' A', mutant heterozygotes A' A and wild-type homozygotes 
AA are 1 — s, 1 — /i and 1, respectively, where it is assumed that v ^ s. Here, we employ a WKB theory and 
directly treat the underlying Markov chain (formulated as a birth-death process) obeyed by the allele frequency 
(whose dynamics is prescribed by the Moran model). Importantly, this approach allows to accurately account 
for effects of large fluctuations. After a general description of the theory, we focus on the case of a deleterious 
mutant allele (i.e. s > 0) and discuss three situations: when the mutant is (i) completely dominant (s = h); 
(ii) completely recessive (/i = 0), and (iii) semi-dominant (h = s/2). Our theoretical predictions for the mean 
fixation time and the quasi-stationary distribution of the mutant population in the coexistence state, are shown 
to be in excellent agreement with numerical simulations. Furthermore, when s is finite, we demonstrate that our 
results are superior to those of the diffusion theory, while the latter is shown to be an accurate approximation 
only when Nes'^ ^ 1, where Ng is the effective population size. 

Keywords: Fixation; Theory of Population Dynamics and Genetics; Genetic Drift and Selection; Birth- 
Death Processes; Diffusion and Large Fluctuations. 
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1 Introduction 



Fixation is a phenomenon that allows to quantify how the frequency dynamics of a rare allele increases under 
genetic drift and selective forces until it takes over the entire population, and is a key topic in population genetics. 
In particular, the fixation of alleles is crucial to understand g enetic diversi t y and has therefore a ttracted much 



attention since the pioneering works by Fisher and Wright (jFisheii ([1922 



1930 ) 



Wrightl (Il93ll )). The study 



of the fixation time of a del eterious allele has be e n sho wn to be important to shed light on the evolutionary 



dynamics of related diseases (jSlatkin and Rannalal (|2000l )). In addition to selection and genetic drift, mutation is 



another important mechanism responsible of genetic diversity. In each generation, mutant genes arise naturally 
and can be either deleterious, selectively neutral or advantageous. A vast majority of mutants are lost i n a fe w 



generations and only a small n umber of them succeed in fixating the population (jKimura and Ohtal (|1971[ )). 



As already noticed by Fisher (jFisheii (|l922l )) a majority of mutations with large effect are deleterious and 
in these cases it is almost certain that the mutant allele will be eliminated before reaching an appreciable 
frequency. However, it has been reported that for some amorphic or hypomorphic mutations the previously 
deleterious mutations may become only slightly deleteri o us (i. e. almost neutra l ) and, under m utation pressure 



and random drift, become fixed in the species (| Mullen (|l939[ ): 



Wrightl |l977h 



Kimural (|l98d )). Examples of 



amorphic/hypomorphic deleterious mutation s range from the loss o f eyes in cave animals to the loss of ability 



to synthesize vitamin C in some vertebrates (jJukes and Kind (|1975[ )) 



In this context, understanding the combined influence of genetic drift, mutations and natural selection 
on t he dynamics of the allele frequency is clearly an is s ue of fundamen t al importance in evolut i onary biol- 



Kimura and Ohtal |l97ll ): 



Kimura 



(119831); 



Ewend ( 2000 ) 



ogy (jCrow and Kimural (|1970[ ): 
large body of studies d edicated to gene frequency is based on the diffusion approximation [see, e.g.. 



Gillespij (l2004h). A 



mm; 



Crow and Kimura 



EwensI (j2000l ))]. The latter approach is intrinsically a neutral theory where the allele frequencies are 
treated as continuous random variables. The diffusion theory has been extremely insightful as, for instance, it 
has allowed the computation of the time-dependent probabilities of fixation of a gene under selection in an ideal 
(randomly-mating) diploid population of large effective siz e. The diffusion theory has been recently generalized 



to study fixation in spatially structur ed metapopulations ( Whitlockl (|2003l) ) and has been used to statistically 



test the existence of positive selection (|Zeng et al 



(|2007ai bl)) PI. The diffusion approximation has also been used 
to study fixation in quasi-neutral systems where the intensities of selection and mutations arc much weaker 
than the genetic drift's strength. In fact, if s denotes the typical strength of non-neutral evolutionary forces 
(selection and mutation) in a population of (effective) size Af . the d iffusion appro ximation certainly gives accu- 
rate results when Afs is a small quantity (jCrow and Kimural (|l970l ): lEwensI (|2000l )). Using heuristi c argument s, 
Nei has recently proposed to replace the above criteria by a less stringent requirement J < J\f~^/^ ( Neil ( 2005 )). 



^I n those referen ces, the frequency trajectories of the derived alleles are generated using the pseudo-sampling method of 
Ref. l lKimural iigSOD ). see Appendix. 
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The detailed analytical and numerical analysis presented in this work substantiates that Afs^ <C 1 is indeed 
the necessary condition for the predictions of the diffusion approximation to be adequate. While the diffusion 
approximation is valid for a narrow range of vanishingly small selection intensity, it is comm only employed also 



when A/ 'y > 1. As an example, it i s often c onsid ered that s w 0.1% — 1% and J\f w 10^ — 10^ IjKimura and Ohta 



(Il97ll) : 



Crow and Kimura 



19701 ): lEwena (|2000l) ) which yields finite values of AfP ranging from 0.01 to 10. 



In this situation, the assumption of quasi-ncutrality, and therefore the validity of the diffusion approximation, 
appears to be questionable. Nevertheless, to our knowledge there have been no systematic investigations to 
establish the validity of the diffusion approximation, and to improve over it, in systems with arbitrary (finite) 



selection intensity and mutations [ 

In this paper, we accurately compute the mean fixation time and the quasi-stationary properties of a deleteri- 
ous allele in a panmictic populatio n of diploid ind i viduals und e r mut ation pressure and finite selection intensity. 



While Kimura and other authors (jKimural (|l980l ) 



Li and Neil (|l977h ) considered this problem within the realm 



of diffusion approximation, we here adopt a different approach originating from statistical physics. By doing so, 
we are able to capture non-Gaussian and large- fluctuation effects that govern the evolutionary dynamics when 
the selection strength is nonvanishingly weak. Indeed, it has been recently noticed that methods borrowed from 
statistica l physics can be e xtrem e ly insightful to address problems of evolutionary dynamics from a broader per- 



spective (jSella and HirshI (|2005() 



Korolev et al 



(|2010f )). Here, we employ a so-called WKB theory (see below) 
to obtain the mean-fixation time (MFT) directly from the Markov chain (formulated as a birth-death process, 
see Sec. 2.2) governing the stochastic dynamics of the allele frequency. This method, that relies on a power 
series expansion in the inverse of the (effective) population size and a suitable ansatz (i.e. a trial function), 
allows us to accurately account for non-diffusive phenomena like fixation that are triggered by rare large fluc- 
tuations. As main results, in this work (i) we demonstrate that the diffu sion approx imation is adequate only 
when s <^ 0(Af^^^^) and A/" ^ 1, corroborating Nei's heuristic estimate ( Neil ( 2005 )): (ii) we derive accurate 
results for the MFT and the population's quasi-stationary distribution (QSD) when s > 0{Af^^^^). 

The organization of this paper is the following: the model and the methods are presented in the next 
section. There, we first discuss the deterministic dynamics obtained when fluctuations are ignored, and then 
derive a stochastic description in terms of a continuous-time birth-death process (using the fltncss-dependent 
Moran model) that takes demographic stochasticity into account. Our (WKB-based) analytical approach is 
then presented in Section 3. The results for the MFT and the QSD are presented and discussed in Sections 
4 and 5.1. Section 5.2 is dedicated to a careful comparison of our results with the results of the diffusion 
approximation and is followed by a discussion and our conclusions. The numerical simulation method is briefly 
described in an Appendix. 



^ Only recently, it h a s bee n shown that some results of the diffusion approximation are prone to numerical inaccura- 
cies l|Wang and Rannalal l|2004l ')). 
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2 Model and Methods 



a 



We consider a diallelic model of randomly mating population of N diploid individuals with an effective size TV, 
We assume that at a particular locus, where the wild-type allele is A while the mutant is A', the relative fitnesses 
of the homozygotes AA, A' A' and heterozygotes A' A are respectively 1, 1 — s and 1 — h (with s > and h>0) . 



This corresponds t o a ge n eric dia l lelic model wher e the m utant allele i s deleterious (Kimura and Ohta ( 19711 ): 



Crow and Kimural (|l970l ) 



Ewend (|2000l) : 



Gillespie! (|2004l )). Following |Kimural (|l980l) : 



Li and Nei 



mm), in 



addition to random mating we also assume that the wild type mutates irrevers ibly to the (del eterious) type A' 
with a small rate v, which accounts for amorphic and hypomorphic mutations (jKimural (|l980f )). In reality, the 
allele A mutates in various types that are here regarded as a single class denoted by A'. 

A situation of particular relevance in population genetics arises when the selection intensity s is weak and 
the mu tation rate v i s much weaker, i.e . < v <^ s 1, where v is several order o f magnitudes smaller 



than s (jEwend (|2000f ): 



Crow and Kimura 



( 1970 ) 



Kimural (|1983[ ): iKimura and Ohtal (|l97lf )) Q. The MPT of this 



model where the mutant allele A' is cither domin ant, rece s sive o r semi-dominant was treated both analytically 
an d numerically with in the diffusion approach in ([Kimural (|l980l )). while the non-dominant case was considered 



in (|Li and Neil (|l977h ). Here, we are are mainly interested in the MPT and QSD of the allele frequencies in the 
presence of (small but) non- vanishing selection intensity and mutation pressure, when the parameters are such 
that 0<w^s^l,s> 0{Ne ) and the validity of the diffusion approximation is thus questionable. In 
fact, due to the broad use of the diffusion theory, an important question concerns how the selection intensity 
s should vanish with the effective population size Ne for the diffusion approximation to be valid, and we here 
demonstrate that the condition to be satisfied in fact is s <C 0{Ne )■ 

2.1 The deterministic description 

To study the dynamics of this model in the continuum limit, the effective population size is assumed to be large, 
i.e. A^e ^ 1, and the allele frequencies are treated as continuous variables. Denoting by x the frequency of the 
mutant allele A' (the frequency of A is thus y = 1 — x) the setting can be summarized by the following the table: 



Genotypes 


AA A'A' A'A 


Relative fitness 


1 l-s l-h 


Prcquency 


(1 — x)^ x"^ x{l — x) 



Accordingly, the population average fitness is 

w{x) = 1 - sx[x + 2{h/s){l - x)]. 



(1) 



^A^ e can essentially be in terpret ed as the number o f individuals that can breed in each generation, see e.g. 

Refs. ||C row and Kimural (ligTOD : lEweni l|2000h : [Gillespid l|2004 )). 

"Typically, A^e ~ 10* - lO^, s 10"* - 10"^ and v RJ 10"'* - IQ-^. Thus, NeS ^ 1 - 1000, NeV 0.01 - 10, whereas 
Nes^ Si 10"* - 10 and Nev'^ 10"** - 10"-' < 1. 
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The change in the frequency of the aUele A' in one generation St caus ed by r a ndom mating is given by 



Crow and Kimural (|l970l ): 



d x(i) 



xii+6i)-x(i ) = x{t) [-1 + {(1 - s)x{i) + (1 - h){l - x{i))] /w{t)\ (|Ewensl (|2000t ): 

Kimura (1980)) [fj. Furthermore, at each generation the irreversible mutation A — A' causes a change 5x{t) 



v{\ — x{i)) in the A' aUele frequency. By putting 6i ~ {2Ne)~^ we set the timescale in units oit = i/{5i) = 2Nj., 
so that each aUele has, on average, interacted once per time unit. As 5x = dx/dt + 0{N^'^), when A^e ^ 1 and 
all fluctuations are neglected, the allele frequency varies according to the rate equation (RE): 



dx 
'dt 



x{l — x) 
w{x) 



[sx + hil-2x)] +vil -x). 



(2) 



This rate equation is characterized by the absorbing fixed point x*^, ~ 1, corresponding to a homozygous 
mutant population (A' A'). In addition, we shall see that in the biologically relevant setting where w ^ s, there 
is always an attracting fixed point < x* < 1 associated with a polymorphic (heterozygous) population (AA'). 
In fact, in the sequel we will discuss in detail the following two main scenarios: when A' is completely dominant 
(s — h > 0), x\i and x* are both attracting and separated by a repelling fixed point x*. Whereas, when the 
mutant allele is either recessive (/i = 0, s > 0) or semi-dominant (h ~ s/2 > 0), x* is the only attractor and x\, 
is the sole repellor. Below, the latter will be referred to as scenario A and the former will be called scenario B. 



2.2 The stochastic description 

While the deterministic equation ([2]) aptly describes the variation of allele frequency when A^e — > oo, it docs not 
account for fluctuations and stochastic effects that are of great importance in a finite population (N^ < oo). In 
particular, these fluctuations are responsible for the unavoidable fixation of the mutant allele A'. The descrip- 
tion of such a phenomenon therefore requires to adopt a stochastic formulation of the evolutionary dynamics. 
A widely used stoc hastic model o f evolving pop ulations with non-overlapping generations was introduced by 



Fisher and Wright |Fishei] |l922l) 



Wright 



19311) ) . A closely related a nd influential model fo r 



overlapping generatio ns, that can be form ulated as a Markov chain (jFelleii (jl968l ): 



introduced by Moran (jMoranl (|l958 



popu lations with 



Ewend (|2000l )). was then 



19621 )). The Moran model (MM) has the advantage of being mathemati- 
cally more amenable than the Wright-Fisher m o del (WFM) and, when the population size is large, shares its 



properties (jEwend (|2000l ) 



Etheridge et al. 



(120061) 



Blvthe and McKanj pOOTh iP 



Th e MM was originally formu- 



lated f or haploid organisms, but has also been recently used for diploid populations (jPurrett and Schweinsbere 



tooi) 



Eriksson et al 



(|2008f )'). In this work, we will consider a continuous-time version of the MM with fitness- 
dependent transition rates [defined by Eqs. (H])-©]. 

The dynamic properties of continuous-time birth-death processes like the MM are suitably described by 



"■It is worth noticing that s and h are often assumed to be vanishingly small and the population average fitness is approximated 
to be id{x) ~ 1. Here, we do not make this simphfying assumption (whose practical validity is difficult to assess). 

®In the neutral case (no selection and no mutations), a generation in the WFM is J\f/2 times as long as in the MM, where J\f is 
the total num ber of individuals involved in the dynam i cs (here, Af = 2Ne). That is , the timescales of the WFM and MM differ by 
a factor J\f/2 l lCrow and Kimural l lf 970l) : [E^^ j2000l ) : [BWthe and McKand ||2007| ")). 
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the following master equation that gi ves the probabili t y distribution f unctio n (PDF) Pn{t) of finding n mutant 



alleles A' at time t in the population (| Gardiner! (|2002[ ): 



van KampenI (|l992l) ): 



dPnjt) 

dt 



r+_iP„-i(i) +T-+i/^„+i(i) - [T+ +T-]P^{t). 



(3) 



Here, and T~ respectively denote the transition rates from a population state comprising n mutants A' 
to another state respectively with n + 1 and n — 1 mutant alleles A' (see below) . For notational convenience, 
it is useful to denote the effective number of alleles hy N = 2Ne. Here, as the state space is bounded, with 
n G [OjA/"], and n = Af is an absorbing state (all A''s), the transition rates satisfy = 0, and it is assumed 
that P„ (t) = when n < and n > Af. With n = M bein g abso r bing, the stochastic dynamics eventually leads 
to the fixation of the mutant allele A' (jCrow and Kimural (|l970f ): iKimura and Ohtal (jl97ir ): lEwend (|2000l )). 

In the continuous-time MM, the population evolves by pairs of alleles being sampled uniformly with replace- 
ment from the population (the number of birth/death events follows a Poisson process and the waiting time is 
thus exponentially distributed). One of the sampled alleles is designated to be the parent and is copied, yielding 
an offspring (birth) that replaces the other allele that is sacrificed (death). Thus, in the fitness-dependent version 
of the MM that we consider, at each time step and after sampling, the pairs A'A can become cither A'A' (birth 
of a', n — > n -|- 1) with probability pas or AA (birth of A, n — > n — 1) with probability pi\. These probabilities 
are expressed in terms of the reproductive potential, or relative fitnesses, of type A' and A respectively denot ed 
/A'(n) and /a(«), i.e. p^' = {n/J^)f/^'{n)/w{n) and pA = (1 — {n/J^))fA{n)/w{n) (jCrow and Kimural (|l970l )). 
Here, u'(n) = (f — {n/J\f))fA{n) + {n/M)ff^'{n) is the population mean genotypic fitness and, according to the 
table leading to ([2]), one has: 



(4) 



In addition to birth and death events, we also have to account for the A A' mutation s , which can be imple- 



mente d in various ways within the MM (see, e.g. 



Crow and Kimural (|l970t) : 



Ewengj pood) : 



Blvthe and McKane 



(|2007l )V To ensure a neat connection with the deter ministic rate equations (Hi), that we wish to recover in the 
limit A/" — > oo (see below), our approach here, as in ([Blvthe and McKand (|20G7l )). is to consider the mutation 
process as being divorced from the death-birth events. Hence, death/reproduction and mutation take place 
independently, with mutations occurring spontaneously (not necessarily between a death and a birth). Thus, 
on average, at each time step each pair of alleles undergoes the above birth-death process and is sampled as 
described above with probability 1 — v. The rest of the time, with probability w, the A offspring (if any) is 
switched into the allelic type A'. Therefore, following the above discussion and using (|1|), the transition rates 



7 



T+ and r„ of the birth (nA' — >• (n + 1)A') and death (nA' — )• (n — 1)A') processes appearing in ([3]), read: 



y4 



T- 



IJ)PA' + 



(1-«)/a'W 



:^)/a'H + (i-;^)/aH 

(1 - v)h{n) 



(5) 



We notice that if time is measured in units of (1 — v)t, the factors 1 — v appearing in the expressions of 
are chminated, while the mutation rate on the right-hand-side of becomes v/(l — v). As we arc interested 
in the hmit of smaU mutation rates and neglect terms of order 0(t;^), throughout this work we shall simply 
consider ~ v) = w + 0{v'^) and measure time in units of (1 — v)t ~ t. In the continuum limit (where 
Af ^ 1), the RE describing the average number n of mutant alleles A' can be directly obtained from Eq. ([3]). 
It reads {d/dt)n ~ T+ — T^, where upon rescaling time as in Sec. 2.i0, one recovers Eq. © for the frequency 
x{t) = h{t)/Af of mutants. 

While it is generally a very demanding task to extract accurate and useful information from Ec^. ([3]), the 
latter can be investigated within v arious approaches. One ve r y popular and in s ightful approx imation is provided 



by the diffusion theory (see, e.g., 



Crow and Kimura 



Kimura 



(Il983l) : 



Ewenj pOOOl )). that is based on 



a Tayl or-expans i on in A / " ^ <g; 1 of th e mast e r equation lead ing to a Fokker-Planck or Kolomogorov equation 



(KE) dCardineij pOOa) : 



van KampenI (jl992f ): 



Risken 



(|l989l )). For the problem at hand, the backward KE 



associated with the birth-death process ([3]) for the probability density V{x,t) = J\f Pn{t), see below, is given by 



dr{x,t) 
dt 

'CbKE(a;) 



i^hKF.{x ^t)'P{x,t), with 

d I 
M(.)- + -n.)^. 



(6) 



In this equation, M(x) and V{x) represent the deterministic drift and the diffusion terms, respectively. The 
connection between ([6]) and the evolutionary dynamics is made by determining the functions AI(x) and V{x) 
from the original (non-approximate) stochastic processes. For this, one notes that according to ([6|) the mean 
of the allele frequency x and the average of its square obey the equations of motion (EOMs) {d/dt){x{t)) = 
j' dxxV{x,t) ^ {M(x)) and [d/ dt){x'^{t)) = J dx x'^'P{x,t) = 2{xM{x)) + {V{x)). The functions M and V are 
then determined by imposing consistency with the EOMs of {x{t)) and (x'^it)) derived from the original stochas- 
tic processes, here defined by ©-([l]), which yields {d/dt){x{t)) = En("/-^)(c^^ri(i)/d<) and {d / dt) {x^ (t)) = 
^jj^(n/A/')^((iP„(t)/(it). Therefore, for the dynamics based on the MM (H))-®, after rescaling time accord- 
ing to t — > Mt (see Footnote 7), the ensuing EOMs for {x{t)) and (x^(t)) give Myiyi{x) = T+ - T,'^ and 
Vmm{x) = {T,f + T^)/M ■ Thus, for the model that we consider here in the limit w ^ s, ft, <C 1 and A/" ^ 1, one 



''The time appearing in Q is rescaled according to t — ^ J\ft. As in Sec. 2.1, this means that the time step \s &t = J\f ^ and 
therefore, on average, each allele is sampled once per time unit. 
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finds Mmm{x) « .t(1 — x)\s x + h(l 



2x)] + v{l - x) and VSim(2;) « 22:(1 - x)/N. 



In the analysis of Rcf. (jKimural (jl983l )). tlie "microscopic" dynamics was implemented according to the 
WFM based on a discrete binomial sampling process defined by a transition matrix [aij], with i,j = 0, 1, ...,Af 
and ^ {Ml/[jKM - (QtY (1 - Q^)^-^ , w here Q, = q, + - + - 2q0]/u) and q, = 



(i/N) + v[l — {i/N)]. In this case, Kimura showed (jKimural (|l983() ) that AfwFM and Vwfm are respectively 



the mean and the variance of the binomial distribution and therefore, in the limit tj <C s, /i <C 1 and A/" ^ 1 
with X = i/N , MYiYM{x) ~ x{\~ x)[sx + — 2a;)] +v{l — x) and VwFM(a;) ~ x{l ~ x)/Af (adopting the same 
timescale as in Sec. 2.1). Kimura, then used these expressions in the differential operator ChKEix,t) of Eq. ^ 
to obtain the MFT from the diffusion approximation. We thus notice that the expressions of AI{x) coincide 
for the WFM and MM (i.e. Mmm{x) = Myf^uix) = M{x)), but the function V{x) obtained for the MM has 
a factor 2 compared to that of the WFM, i.e. VMM(a;) = 2VwFM(a;) = 2y(a;). This difference stems from the 
fact that (in the neutral case) a generation in the WFM is J\f/2 times as long as in the MM (see Footnote [6]) 
and means that the results for the MM can be mapped onto those of WFM (within the realm of the diffusion 
approximation) via the transformation Af = 2Ne — > Af/2. In other words, the predictions of the MM comprising 
A/" (breeding) alleles should be compared with those of the WFM made up of J\f/2 (breeding) alleles. 

In this work, our goal is to accurately compute the MFT and QSD of the mutant allele A' for the MM defined 
by ([31)- ([S]), beyond the limit of validity of the diffusion theory. Here, when s is finite, fixation is triggered by 
a rare large fluctuation and the non-diffusive dynamics is characterized by a QSD with non-Gaussian ("fat") 
tails. Therefore, we will show that this phenomenon cannot be accurately described by Eq. ^ in the realm of 
the diffusion approximation. It turns out that the eigenvectors and eigenvalues associated with the transition 
matrix of the underlying birt h-death proce s s (|51)- (|5 ]) can be obta i ned analy ti cally and be used to formally 



compute the MFT and QSD 



van KampenI (|l992r ): 



Ewend pOOOf) : 



Gardinen (j2002f )V However, while these 



results are exact, they are non-generic (i.e., limited to diallelic models), and it is very difficult to extract their 
asymptotic behavior. It is therefore desirable to de velop reliable and more ge n erally applicable approxim ation 



methods to study stochastic evolutionary problems (jMobilia and Assai J 201C ^ 



Here, we employ a WKB (Wentzel- Kramers-Bri l louin) theory 



properties of the master equation ^ (jKubo et al 



( 19731) : 



Assaf and Mobilial (|2010l )) 



Landau and Lifshitz 



Dvkman et al 



19771 )). to analyze the 



1994) ). The WKB approximation. 



sometimes referred to as semi-classical or eikonal approximation, is an asymptotic theory frequently used in 
the semi-classical treatment of quantum mechanics. This method has been recently used in the context of 
population dynamics where it allowe d to accurately study exti n ction/fixation from a met a stable state as the 



result of large fluctuations, see, e.g., (jAssaf and MeersorJ (|2010l ): 



Mobilia and Assai 



(|201(3l) : 



Assaf and Mobilia 



(2010|)) and references therein. Our WKB-based analytical treatment is supported by numerical solutions of the 
master equation and Monte Carlo (MC) simulations, whose implementation is briefly described in the Appendix. 
While the crux of the WKB method is given in the next section, let us first gain some insight into the problem 
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Figure 1: On the left shown is a histogram of fixation times (triangles) obtained from MC simulations with 
10^ runs, for Af = 50, s = 0.1, h = 0, and v — 0.04 for the dynamics aceording to the MM. The fixation 
times are found to be exponentially distributed as indicated by the excellent agreement with an exponential 
distribution (solid line) with mean r = 3, 158. On the right shown are the mean fixation times r versus the 
(effective) population size Af for the cases of recessive (upper panel), completely dominant (middle panel), and 
semi-dominant mutants (bottom panel). Here s = 0.1 and v = 0.04. The numerical solution of the master 
equation ([3]) (x's) agree excellently with results of MC simulations (triangles) averaged over 100,000 runs. 



at hand, by looking at numerical results reported in Fig. [TJ On the left panel of Fig. [TJ we report a histogram 
of fixation times obtained from a MC simulation with 10^ runs, see Appendix. The latter is excellently fitted 
by an exponential distribution, which is a characteristic feat ure of systems displaying metasta bility and where 



Assaf and MeersonI ( 2006b 



20071 ))). The average 



fixation/extinction is driven by large fiuctuations (see e.g. 
of this fixation time distribution is the MFT, denoted by r, and is the quantity that is computed in the next 
sections. On the right panels of Fig. [1] we compare the MFT obtained from MC simulations (averaged over 
100,000 runs) and by numerically solving the master equation ([3]), and observe an excellent agreement. This 
indicates that averaging over sufficient MC runs accurately reproduces the stochastic dynamics of the system 
described by the master equation ([3]). 



3 The WKB approach 

In this section, the main general aspects of the WKB approach to treat the birth-death process (O-® are 
presented. The basic idea relies on the fact that in the presence of demographic fiuctuations the process is 
characterized by metastability, with the frequency of A' lingering around the metastable value x* . After a very 
long time of average r, the mutant allele A' eventually takes over and fixates the entire population due to the 
combined effect of selection, weak (yet steady) mutation pressure and random fluctuations. 

To proceed analytically the following key assumptions are made: (i) The population size is large (but flnite). 
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i.e. A/" 3> 1. (ii) Fixation always occurs from the interior fixed point x* that is reached after a typical relaxation 
time tr <C T. That is, we assume the system converges into the vicinity of the fixed point prior to fixation I^. 
The crux of the analytical treatment of the master equation is an expansion of P„ (t) , the system's PDF, 



i nto a series o f eigenvectors and eigenvalues of the stochastic (Markov) generator, (see e.g. 



Assaf and Meerson 



(|2006a . 



20101 ) and references therein). After a relaxation time tr the system settles in the metastable state, 
where the frequency of alleles A' is (sharply) distributed around x*, for a very long time. Such a metastable 
state is e ncoded in the first excited eigenv ector, 7r„, of the Markov chain ([3]) that has not vanished after t > tr, 



(see e. 



Assaf and MeersonI ( 2006b . 



20071) ). The quantity 7r„ therefore determines the QSD (the shape of the 



metastable PDF), while the decay rate of the metastable distribution is given by the first nonzero eigenvalue of 
the Markov chain ([3]), which is the inverse of the MFT. Hence, at t t,.. the slow decay of the metastable PDF 



Pn<M and the slow increase of the fixation probability Pjs/ are given by (jAssaf and MeersonI (|2006b 



20071) ) 



Po<„<A^(t) ^ 7r„e-*/^ and P^(t) ~ 1 - e"*/^ , 



(7) 



where r, the mean decay time of the metastable state, is the mean time it takes the system to fixate. It follows 
from © and that [d/dt)PM = (l/r)e-*/^ = T^.^tta^-i e"*/^, which readily gives 



(8) 



This equation simply means that the mean fixation rate is given by the flux of probability into the absorbing 
state 71 = M. To obtain such a quantity, we need to compute i:j\f-i- It is also valuable to compute the entire 
QSD that is characterized by markedly non-Gaussian ("fat") tails. Note, that the expression ^ is independent 
of the number of mutants alleles n initially present in the population. This reflects the assumption that flxation 
always occurs from the metastable state that serves here as an effective initial condition, see Footnote 8. 

To determine the QSD we now assume that the MFT is exponentially large in A/" ^ 1 (which will be checked 
a posteriori) and substitute the expressions ([7]) into Upon neglecting the exponentially small term i^n/T, 
one thus finds that the QSD obeys the following quasi-stationary master equation: 



= T'+_i7r„_i + P„_|_i7r„+i - [T;+ + T„ ]7r„. 



(9) 



We will now solve this equation using the WKB approximation. Originally, the WKB approximation was 
introduced to treat ordinary differenti al equations where the highest-order derivative is multiplied by a small 



parameter (jLandau and Lifshita (|1977I )). Here, with x = nj M regarded as a continuous variable when A/" ^ 1, 



*This assumption is always satisfied here since it is natural to assume that only a few mutant alleles are initially present. In any 
case, we assume that the initial number of mutants is not too close to M , in which case fixation does not occur instantaneously. 
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we employ the following WKB ansatz (i.e. a trial function) [see, e.g., (jKubo et al. 



mm; 



Dvkman et al. 



(1199D)] 



(10) 



where S{x), Si{x), . . . arc all assumed to be of order unity. Substituting ansatz ()10p into Eq. ([9]) yields in 
the leading order in A/" 3> 1 an equation for S{x), whose sol ution may be used to solve the subleading-order 



equation for 5*1(0;), etc. In analogy with dynamical systems (jLandau and Lifshita (jl977r i). S{x) is referred to 



as the action while Si{x) is called the amplitude. Here, is a constant introduced here for convenience. Note 
that higher-order corrections in the exponent of ()10p arc of order 0{Af~^) and hence negligible. 

Finding the MFT of the mutant species A' can be rephrased as determining the mean time to extinction 
(MTE) of the wild type A in the aftermath of a long-lived metastable coexistence o f both species. The problem 



mm) 



of finding the MTE in a generic two-state system has recently been studied in Ref. (lAssaf and MeersonI 
whose results can be used here. In the following, for the sake of clarity, we briefly repeat the leading order 
calculation of the MFT (or MTE) and QSD and then outline how subleading-order corrections are obtained. 

The leading-order calculations require finding the action S{x). Substituting ansatz (|10p into Eq. (|9]), and 
using the continuous c ounterpart of the transition rates ([5]), 7'±{x) ~ T^inj M\ we obtain in the leading order 



( Dvkman et al 



T+{x) (e^'(") - 1) + T-{x) (e-^'("' - 1) , 



(11) 



whose solution is 



S{x) 



(12) 



This corresponds to an integral over what is called the "optimal path to extinction" . The latter refers to the 
path followed by th e stochastic sys t em th at leaves the metastable state at t = — 00 and arrives at the absorbing 



state at t = 00 (see 



Dvkman et al 



(|l994l ) for a detailed discussion) 



To relate the MFT and the action, we write in the continuum limit T]^ 



and tttv-i ~ 7r(l), and thus, expression ([5]) can be rewritten as 



ir;(i)Ki)- 



'^_^'^\Tl{l)\/M [since T+(l) vanishes] 



(13) 



Furthermore, we have seen that the QSD is peaked in the vicinity of x* , where the relative width of the distribu- 
tion scales as A/""^/^. Therefore, the QSD is strongly peaked around x* when A/" ^ 1, a nd the constant A can be 
found by approximating the QSD by a Gaussian centered at and normalized to unity (jEscudero and Kamenev 



mm 



Assaf and Meerson 



( 20101) ). Indeed, for x ~ x* one can write 7r(x) ~ J\^e-^s{x')-s^{x')~{^r /2)s" {x'){x-x'f ^ 
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whose normalization yields in the leading order A ~ gj^s{x ) _ Therefore, to leading order one has 



(14) 



As a result, using Eq. (fT3|) . one finds 



(15) 



where AS* = >5'(1) — S{x*) is the "accumulated action" over the optimal path to extinction. It is worth noticing 
that in the case of complete dominance the absorbing and interior fixed points, x\, and x* , arc separated by 
a repelling fixed point x*. Therefore, in this case the accumulated action strictly reads AS" = <S'(ic*) — S{x*). 
However, in the limit of weak mutation rate {v <^ s] that is of interest to us, on e has x*, ~ 1. For the thr ee 
cases of interest, we thus have AS* ~ S[l) — S{x*) (jEscudero and KamenevI (j2009l ) : lAssaf and MeersonI (|2010l )). 

Expressions (fTi|) and (fT51) therefore give the generic leading-order results for the QSD and the MET of 
the problem, whereas the action function S{x) has to be calculated separately for the cases of completely 
dominant, semi-dominant and recessive mutant alleles (see Section 4). The ensuing results arc valid provided 
that AfAS ^ 1 (for the MET to be exponentially large as required by the WKB ansatz), along with the 
necessary condition Af ^ 1. 

The calculation of the subleading corrections to the MET is more involved. It has in fact been shown that 
for a generic t wo-state problem ex t inctio n (and therefore fixation) occurs via two scenarios, called scenarios A 



and B in Rcf. (jAssaf and MeersonI (|2010|)). Here, the cases of a semi-dominant and completely recessive mutant 
allele, where the absorbing (x^,) and interior {x*) fixed points of ^ are respectively repelling and attracting, 
correspond to scenario A. On the other hand, the case of a completely dominant mutant allele where both x\, 
and X* are attracting and separated by a repelling fixed point, corresponds to scenario B. We now quote the 



result s and outline the main steps to the subleading-order calculations and refer to Ref. (jAssaf and Meerson 



(|2010f )) for technical details. To determine the subleading-order contribution to the QSD and MET in problems 
belonging to scenario A, the WKB solution needs to be matched with a recursion solution of the (quasi- 
stationary) master equation Q in the close vicinity of the absorbing state n = J\f. This is because the WKB 
solution (fT4|) is only valid sufficiently far from the absorbing state. In Scenario B, the WKB solution already 
breaks down in the vicinity of the intermediate repelling fixed point a;*. Thus, the master equation ([9]) has to 
be solved in the a close vicinity of (by using, e.g., a Eokker-Planck approximation). This solution needs to 
be matched on the one hand with the WKB solution for x < xl, and on the other hand with the recursion 



solution of Eq. ^ when x > x* (jAssaf and MeersonI (|2010l )). 



Implementing these steps and using the action (fT2)l . one finds for the QSD (jAssaf and MeersonI (j2010( )') 



T^(j.) ^ V S" {x*)r+{x*) ^^^f[six)-s{x'■)] 
^2T:NT+{x)r-{x) 



(16) 
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while the MFT in Scenarios A and B is respectively given by r^^ and r^^, where 



T+ix*)iR-l)^/S^ ' T+ix*)^S"{x*)\S"ix*)\ 

with R = 7j1(1)/7^(1). These results hold for two-state populations undergoing generic single-step processes 0. 
It is worth noticing that the subleading prcfactors in ((T5)) and P7)) scale as some power oi J\f ^ 1. 



4 Results for the MFT and QSD 

In this section we present our results for the MFT and QSD in the cases of (i) complete dominance, (ii) recessivity 
and (iii) semi-dominance and focus on the limit where the mutation rate is much smaller than the selection 
intensity, i.e. u <C s <C 1, sec Footnote 4. To ensure the validity of the WKB treatment, we also assume that 
s ^ A/'^^, i.e. the selection intensity s is "not too small" (see below). For all cases (i)-(iii) we shall give the 
detailed analytical derivation of the action (|12l) to order 0{v) that yields the leading contribution to the QSD 
and MFT according to (fT4|) and . We also check our findings against results of numerical simulations and 
report the analytical results for the MFT and QSD that include pre-cxponcntial contributions. 
In the continuum variable x = n/Af, the transition rates in our problem are given by 

l-sx[x + 2{h/s){l-x)]^''^^ ■''''^ ^-^""^^ l-sx[x + 2{h/s){l~x)Y 

For convenience, in the rest of this section we will work in terms of the variable y ~ 1 — x corresponding to 
the frequency of the wild type A. In terms of the y variable, the absorbing state corresponds to j/ = 0. Also, 
under the mapping a; — > y = 1 — .x the transition rates are transformed into 7± (x) — > Tip (y) ■ 

4.1 Results in the case of complete dominance {s = h > 0) 

When the mutant allele A' is completely dominant, i.e. s — h > 0, the rate equation ^ admits the following 
interior fixed points {y = 1 — x) 

^ l±V(l + 2^;)2-4^;(l + ^;)As 

2(1 + 1;) ■ ^ ' 

As we consider w <C s ^ 1, these expressions can be simplified, yielding ~ 1 — v/s + O(u^) and y*_ = 
(1 — s)v/s + 0{v'^). It can be readily checked that y'^ and yt^ are respectively attracting and repelling. Adopting 
the notation of Sec. 2, we can write y* ~ and y*u~y*-^ a-nd in this case the metastable state y* and the 
absorbing state j/a' = are separated by a repelling fixed point y * , which indicates that this problem corresponds 



^In the case where the rescaled transition rates 7±(x) also include subleading order c orrections of order 0(N an additional 
0(1) prefactor enters the final results I I16II and II17I I. see Ref. jAssaf and MeersonI | |2010|) ). 
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to scenario B of extinction. In this case, the transition rates (1181) read: 



_ {^-y)yHy-^) + ^ ^_,._y[s{y-i){vy + v + i) + v-y + i] 



1 - .5 (1 - 2/2) 



l-s(l-y2) 



Thus, 



r-(x) i-s 



(20) 



T+{y) l-s{l-y) {l^y)[l-s{l-y)] 
The main contribution to the MFT and QSD is determined by the action (fT2|) . To leading 0{v) order, we find 



S{y) - - y HT+{OIT-{i)] di^y- 



y] hr f 1 + ^ V '^^^ + t: ' + Ojv')- (21) 



1-s 



2(1-.) 



Therefore, the accumulated action over the optimal path to extinction is given by 



AS = S{y:)-S{y*)^-Jn(^^ 



1 - s 



V s 



3s (1 - s) ln(l - s) 

y ^ 2 



w + 0(w2), (22) 



while, from to order 0(u), one finds S"'(y*) ~ s^/v , S"{y*) ~ -s. 

Therefore, using Eqs and p7)) . the MFT and QSD in this case are explicitly given by 



(23) 



where S{x) is given by Eq. (|2ip with x — 1 — y- Note, that while we have included here only leading-order terms 
with respect to s <C 1 in the prefactors, the exponent remains a nontrivial function of s, see Eq. pip. This is 
because even though Af^^ <C s <C 1, we are not making any assumptions regarding the smallness of terms such 
as Afs^/^, J\fs'^, etc. In fact, as these terms are exponentiated in ([2^ . all of them must be kept. 

The validity of these results requires first of all that JVAS ^ 1 (for the WKB theory to hold), that is, 
s 3> Af^^ (namely s cannot be too small). Furthermore, since we have neglected terms of order A^, v 

must satisfy v ^ M^^^"^ . In addition, the theory assumes that y* and are sufficiently separated from the 
boundaries y = 1 and y = 0, respectively, with a Gaussian distribution in the close vicinity of y*, which also 
requires that v ^ M^^ o (in addition to s <C 1). Overall, the validity criteria can be summarized by 



(24) 



These conditions are generically fulfilled in the system that we consider in this work (see Footnote 4). 

^''The requirement v 3> is only of technical convenience. This condition does not stem from our theoretical approach; 

without such a condition, y* would not necessarily be sufficiently separated from y = 1 to allow a normalization by a Gaussian 
approximation, see text. Yet, up to a multiplicative factor, one would obtain the same results as reported here. 
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4.2 Results in the case of complete recessivity {h = 0, s > 0) 

When the mutant aUele A' is completely recessive, i.e. s > and h = 0, the rate equation ([2]) admits a single 
attracting interior fixed point that reads 



* -1 

y = 1 



s{l + v)' 



(25) 



Again this term can be simplified for v <^ s into y* = 1 — \fvjs + 0(v'^l'^\ As explained above, this case 
corresponds to scenario A of extinction. In this case, the transition rates read: 



r+(2/) 



(1 - y)y 

1 - s (1 - 



and T_ (y) = y 



1 + f 



y 



(26) 



Therefore 



r-{x) 
n{y) 



i-s(i-y) + 



s(i-y) 



and, to order ©(u), the action S'(y) given by Eq. is 

5(2/) = - r ln[r+(0/r-(e)] - -y+ + ln[l-,5(l-2/)]-(l - y + ln(l - y) + 1^ ln[l - ,5(1 - y)] \ v. 



Up to order 0{v'^), the accumulated action between y = and y — y* is thus given by 



(27) 



AS* = S{0) - S{y*) = 1 + ln(l - s 



ii,!:_3_i-. 

2 s 2 s ^ ' 



v + 0{v^/^). (28) 



Using these results, and the fact that S"{y*) ~ 2s, Eqs. and ([T7)) become in this case 



(29) 



where <S'(a;) is given by Eq. (P?]) with x ~ 1 — y and AS" by (pS)) . Here, similarly as in the completely-dominant 
case, the WKB results (P^)) are valid as long as condition holds (see Footnote 10). 

4.3 Results in the case of semi-dominant case {h = s/2, s > 0) 

When the mutant allele A' is semi-dominant, i.e. s > and h = s/2, the rate equation also ^ again admits a 
single attracting interior fixed point that reads 



2/ = 1 - 



2?; 



(30) 
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Similarly as before, for w <C s this expression simplifies to = 1 — 2w/s + 0{v'^). This case again corresponds 
to scenario A of extinction. Here, the transition rates (|T8|) read: 



r_^(^y)- ii-y)yi2-s{l-y) ^_ _ y[{l-y + v)- s{l - y/2 + v){l - y)] 



1 - .s(l - y) 



1 - s{l - y) 



(31) 



Thus, the action function [Eq. (|12p ] and accumulated action are here explicitly given by 



1 - 



2 - ,5(1 - y) 



- ln[2 - s(2 - y)] - 



2{(1 - s) ln[2 - s(2 - y)] +Hl~y)} ^ ^ ^^^^^ 



2-s 



(32) 



AS = S{0)- S{y*) = 



n(2-s)ln(^)-ln(l-s)] 



^[1 - 2(2 - s) ln(2 - s)] - 2(2 - s) |2 In (^ri) - s " 2) ln(2 - 2.s) + In (^^^7^)] } 



s(s-2)2 



v + Oiv"^) (33) 




100 200 300 400 

N - n 



500 



Figure 2: Shown is the QSD 7r„ of the number of wild type in the cases of complete recessivity (top panel), 
complete dominance (middle panel) and semi-dominance (bottom panel). The parameters are A/" = 600, s = 0.25 
and V = 0.025. The numerical solution of the master equation ([3|) (x symbols) is compared with the predictions 
of the WKB approximation (solid curves) in the "bulk" (i.e. away from the states n = Af and n — 0) given 
by (|23|) . (|29|) and (p4|) . In all cases an excellent agreement is observed between the WKB and numerics in the 
regions of applicability of the WKB. In the top panel, for the sake of comparison we also show a Gaussian 
approximation of the QSD (dashed line). Inset of the top panel: the WKB and Gaussian approximations in the 
vicinity of the (metastable) fixed point . While it agrees well with the WKB solution in the close vicinity of 
n*, the Gaussian approximation exponentially deviates from the WKB and numerical solutions near the tails. 

Using these results, and the fact that S"{y*) ~ s^/(4u), Eqs. ((T6)) and (fTT)) become in this case 



r = 



sJv V 27rA/ 



(34) 
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Figure 3: Logarithm of the mean fixation time, hir, as a function oi Ms for the MM dynamics defined by ([3])- 
([S]) in the cases of complete recessivity (top panel), complete dominance (middle panel), and semi-dominance 
(bottom panel). The parameters are Af — 200, v = 0.005, while the selection intensity s varies. The solid lines 
are the WKB results given by Eqs. 1^^, ([Ml), and ([M)) for the top, middle and bottom panels, respectively. 
The symbols x are results of the numerical solution of the master equation ([3]), while the triangles are results 
obtained from MC simulations averaged over 100,000 runs, see Appendix. One can see a very good agreement 
between the WKB and numerical results. (Here, the MC simulations have been carried out up to Inr « 20 
above which they become excessively time consuming.) 



where S{x) is given by Eq. ((32)) with x = 1 — y and AS" by p3|) . Similarly as in the completely- recessive case, 
the WKB results here are valid as long as condition (p4)) is satisfied (sec Footnote 10). 



5 Discussion of the results and comparison with the diffusion theory 



We now analyze the various results obtained for the QSD and MFT and then compare the pre d iction s of the 



WKB treatment for the MFT with those derived by Kimura using the diffusion theory (jKimura 



5.1 Comparison and analysis of the results for the QSD and MFT 

Figures [2] to [4] summarize the results obtained for the QSD and MFT in the cases of (i) complete dominance, 
(ii) complete recessivity, and (iii) semi-dominance of the mutant allele A'. In Fig. [5] we compare the predictions 
of the WKB theory with the numerical solution of the master equation ^ and excellent agreement is observed 
in all three cases. In particular, in the top panel we remark that the QSD is bell-shaped and Gaussian only in 
the close vicinity of the metastable state = Afx* (see inset of top panel of Figure [2]), while the tails of the 
QSD are clearly asymmetric and non-Gaussian. Note, that differently from the QSDs of the top and bottom 
panels, corresponding to extinction scenario A, the QSD in the middle panel co rresponds to extinct i on sce nario 
B and displays an increase of probability towards the absorbing state n = J\f (jAssaf and MeersonI (|2010|)). In 
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this case the WKB approximation breaks down in the vicinity of the repelhng fixed point A/"?/* , where another 
approximation has to be used instead (see Sec. 3). 

In Figs. [3] and m we compare the predictions of the WKB resuhs ^ for the MFT with those 

obtained from the numerical solution of the master equation and with MC simulations averaged over 100, 000 
runs. Very good agreement is observed in all cases. As predicted by our theoretical results, and since AS" 
increases with s, in all three cases, the MFT displays exponential growth with J\fs (i.e. the effective population 
size multiplied by the selection strength), as found in Fig. [31 Yet, as v is essentially the drift towards the 
absorbing state, the MFT decays when the mutation rate v is increased, as reported in Fig. The systematic 
deviations seen in Fig. 2] as w is increased (and approaches the value of s) stem from the breakdown of two 
requirements of the WKB theory, namely AfAS ^ 1 and v ^ s (see insets for the dependence of AS on v). 




U I I I I I ■ ■ ■ I 

10"^ V 10"^ 



Figure 4: Logarithm of the mean fixation time, Inr, as a function of mutation rate v for the MM dynamics 
defined by (I3|)-(l5|) in the cases of complete recessivity (top panel), complete dominance (middle panel), and 
semi-dominance (bottom panel). The parameters are J\f = 200 and s = 0.1. The solid lines are the WKB 
results given by Eqs. ([M)) . and ([M)) for top, middle and bottom panels, respectively, while the x symbols 

are the numerical solution of the master equation ^ , and the triangles are results of MC simulations averaged 
over 100, 000 runs, see Appendix. One can see a very good agreement between the WKB and numerical results. 
This agreement, however, deteriorates as v is increased such that MAS becomes 0(1), see text. The insets 
show the dependence of MAS on v given by Eq. (j^ . (^5)) . and respectively. 



5.2 Comparison between the diffusion theory and the WKB treatment 



We now compare our WKB- based predictio ns for the MFT with those obtained by Kimura using the diffusion 



theory [Eqs. (11)-(13) of Ref. (jKimural (|l980l ))]. For such a purpose it suffices to focus on the leading contribution 
to the MFT and consider its logarithm. We demonstrate that in the subregime v s ^ M^^l"^ , the predictions 
of the diffusion theory coincide (to leading order) with those of our theory, while over a wide range of selection 



intensity, M <C s <C 1, the diffusion theory is plagued by exponentially large errors. 
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First, it is useful to simpliiy the result obtained by Kimura [sec Eq. (11) of (jKimural (|198ClD )]. which reads 



KIM I 
"diff 



(35) 



where B{x) = Mxlsx + 2h{l — x)]/2. According to the above discussion, here we have put 4A^p — >■ 2N p = Af 



to map the ("microscopic") WFM considered by Kimura to derive his diffusion theory in (jKimural (|l98Cll )) onto 



the MM dynamics (01)- ([S]) discussed in this work. Our goal here is to evaluate the asymptotic behavior of ([55]) 
in the limit A/'^l,u^s<Cl, and Afs ^ 1, where our WKB-based predictions are certainly valid. For this 
we rewrite Eq. ([55)1 as 



V g-(A^/2)/(0 



(36) 



where 



f{x) = x[sx + 2h{l - x)] - 2t;ln: 



(37) 



As the function in the exponent is rapidly varying, the inner integral can be evaluated using a saddle-point 
approximation while the outer integral is evaluated by using a Taylor expansion. For the saddle-point calculation, 
the maximum of the function —/(f) is attained at f* ~ \pvjs for h = 0, f* ~ vj s when h ~ s, and f* ~ 2v/ s 
for h ^ s/2. Wc note that f* coincides (to leading order) with the attracting fixed point x* in all three cases of 
interest. Therefore, the inner integral of Eq. ((36)) can be approximated aslll|/J'[e(l-0]~^e-(^/2)/(«)de ^ [^(1- 
r)]"^ /fl e-(^/2)/(r)-(A^/4)/"(r)(«-r)'df _ e-(^/2)/(«*). As this integral turns out to be independent of 77, 
and /(f) is an increasing function on f* < f < 1, the main contribution to the remaining outer integral of p6p 
arises from 77 = 1. Thus, Taylor-expanding the integrand about 77 = 1, to leading order the remaining integral 
becomes /^^ e(^/2)/(r,)^^ _ j-i g(AA/2)/(i)-(AA/2)(i-„)/'(i)^^ _ g(A^/2)/(i)^ Therefore, using dSZ]), substituting the 

above integrals into (j36p . and neglecting logarithmic corrections, Kimura's result psp becomes: 



{x)c,{Ul2)[f{l)-f{x*)]. 



(38) 



We now consider the predictions of the WKB theory for the logarithm of the MET in the limit w <C s ^ 1 
(with s ^ to ensure the validity of the WKB treatment). It follows from Eq. ([TS]) that 



InrwKB '^^ Ml^S = N[S{\) - S{x*)\ 



(39) 



for all three cases (i)-(iii) considered here. For the sake of comparison with the predictions of the diffusion 



^^It is justified to set the boundaries of tlie inner integral to ±00, as the integrand of Eq. I|36| l with respect to rj is regular at 
r/ — ^ 1, and the peak of the Gaussian is positioned sufficiently far from the boundaries in all three cases h = 0, h = s/2 and h = s. 
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Figure 5: Logarithm of tlic MFT, Inr, as a function of Ms^ in the case of recessive mutants (left panel), and 
dominant mutants (right panel). The parameters are M = 1000, v = 0.001, and s = h > varies. The solid 
curves correspond to the WKB result ([T5|) Inr ~ AfAS with (|22|) . The x symbols are the results obtained 
from the numerical solution of the master equation the triangles are MC simulations averaged over 100, 000 
runs (see Appendix), while the dashed curves correspond to the result ([35]) obtained from the diffusion theory. 
The WKB and numerical solutions compare excellently over the entire range of parameters, while the results 
of diffusion theory are found to fairly agree with the others for A/'s^ < C'(l). However, for Afs'^ ^ 1, the 
dashed curves deviate systematically from the others; in this regime, the diffusion approximation exponentially 
deviates from the WKB approximation and numerics. Insets: ratio oi Ihtivkb and Inrpp illustrating that the 
deviations between twk b and Tp p increase exponentially when J\fs'^ 3> 1 . 



theory, with ([T^ and we notice that to linear order in s and h one has 



S(i) 



+ h{l~x) - vlnx + 0{s^,h'^,sh). 



(40) 



From Eqs. PO)) and (P7|) . it is clear that the action S{x) is related to f{x) by S{x) — f{x)/2. It follows from 
this discussion that in the limit w <C s <C M^^^^ (with Ms S> 1) the leading-order logarith m of the WKB result 
([M)) exactly coincides with the result ([55t obtained by Kimura using the diffusion theory (jKimural (Il980( )). 



However, since the WKB is applicable for any s <C 1, one can calculate the next-order corrections to the 
exponent of the MFT p9|) and this indeed leads to the following accumulated action: 



AS= < 



3(1) - S{x*) ~ f - I [1 - In (v/s)] + ^ for = 
S{xl) - S{x*) ~ f - u [1 - hi{v/s)] + 4 iorh = s 
S{1) - S{x*) ~ f - w [1 - In i2v/s)] + 4 forh = s/2. 



(41) 



where we have used Eqs. (PT|) . (P7)) . (P^ . and PO]) and the expressions of = 1 — y* and = 1 — y,* determined 
in Section 4. From Eqs. ([55)1 and (HU, one can see that already the next-order corrections 0{s^, h^, sh) are not 
captured by the diffusion approximation, and thus, the difference between the predictions of the diffusion and 
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WKB predictions is given in the leading order by In twkb — In TKim ^ Afs'^ . It is thus clear that when A/'s^ 3> 1 
(or s 3> Af^^^'^) there are exponentially large deviations between the predictions of the diffusion theory and those 
of the WKB approach. On the other hand, as illustrated by Figs. [3][5l the results of our WKB treatment are 
in excellent agreement with numerics over the entire r egion of pa rameters. Therefore, our theory substantiates 



and quantitatively supports Nei's heuristic argument (jNeil (j2005l )). 



These findings are illustrated in Fig.[5]for the case of completely dominant and completely recessive mutants. 
[Similar results (not shown here) are also obtained for the case of semi-dominance {h = s/2 > 0)]. In Fig. [5l 
the WKB-based results Iutwkb arc compared with the logarithm of the expression ([55]) obtained from the 
diffusion theory, as well as with the numerical solution of the master equation ^ and MC simulations. From 
this comparison, it appears that in the regime where u <C s <C M^^^'^ the diffusion theory is in fair agreement 
with the predictions of the WKB theory and numerical simulations. However, for s ^ A/"^^/^, the predictions 
of the diffusion theory are plagued by exponentially large errors. 



6 Discussion and Conclusion 

In this work we have considered a model motivated by the problem of fixation in systems experiencing amorphic 
or hypermorphic mutations, and studied the dynamics of a panmictic diallelic population of diploid individuals 
at a single locus subject to (small) mutation pressure from a deleterious allele. Such a model is characterized 
by metastability and its long-time dynamics is therefore governed by large fiuctuations. In this case, the system 
fluctuates around the metastable state (forming a quasi-stationary distribution about it) until it is driven into 
the absorbing state where the population is composed of only the mutant allele. Since the rare large fluctuations 
that trigger fixation in this system a re ill-described by the diffusion approximation that was previously used 



to study this problem (jKimural (|l98Cll )). we have here adopted a different approach based on the WKB theory. 
This theory has allowed us to accurately determine the system's mean fixation time (MFT) and quasi-stationary 
distribution (QSD) for arbitrary (finite) selection strength and weak mutation rate. 

Our treatment is based on a stochastic formulation of the dynamics in terms of a birth-death pr ocess (Markov 



chain) , where the transition rates are given by a frequency-d ependent vers i on of th e Mora n model (jMorarJ (jl958 



19621 )). which is closely related to the Wright-Fisher model (IWrightl (|l931l ): 



Fished (|1922D ) [see Section 2.2]. The 
master equation associated with the birth-death process has been treated using the WKB approximation, which 



is a po wer-series expansion in the (effective) population size based on an exponential ansatz (jLandau and Lifshitz 
(|l977l )l. Using this approach we have investigated the fixation phenomenon when the deleterious mutant allele is 
(i) completely dominant, (ii) completely recessive, (iii) semi-dominant, and our main findings are the following: 

• We have analytically calculated the MFT for scenarios (i)-(iii), including the sublcading-order corrections 
to the MFT that were found to scale as some power of Af. Our predictions are found to be in excellent 



22 



agreement with numerical solutions of the master equation ([3|), and stochastic Monte Carlo simulations 
(see Appendix). In all cases we have found that the MFT's exponent grows monotonically with Afs, 
where A/" ^ 1 is the effective population size (up to a constant factor) and s is the mutant allele's selection 
intensity. We have also found that the MFT decreases when the mutation rate increases [see Eq. (|4ip ]. 

We have analytically calculated the QSD up to subleading order for the cases (i)-(iii), and verified our 
results by comparing them to numerical simulations. In all cases the QSD, centered around the metastable 
state, is found to be markedly non-Gaussian when s is nonvanishingly small. We have also found that the 
shape of the QSD's in cases (ii)-(iii) di ffers from that of case (i), as cases (ii) and (iii) correspond to a 



different fixation scenario than case (i) (jAssaf and MeersonI (j2010f )) 



• Our (leading order) predictions for the MFT, based on the W KB theo r y, hav e been compared with 
Kimura's predictions obtained from the diffusion approximation (jKimural (|l983l )). This comparison in- 
dicates that the diffusion approximation is valid for J\fs'^ <C 1. Yet. when nonlinear contributions of the 
selection strength are no longer negligible, the predictions of the diffusion approximation are found to 
be exponentially flawed and their inaccuracy grows with the (effective) population size, as illustrated by 
Fig. [5j This demonstrates that Afs'^ ^ 1 is the weak selection limit w here t he di ffusion theory is an 



adequate approximation, which substantiates a recent heuristic argument (jNei 



Finally, these results shed further light on the interplay between genetic drift and selection. They also illustrate 
that in systems exhibiting metastability and governed by rare large fluctuations, the diffusion theory is well- 
suited only in a narrow range of parameters (weak selection limit) where the dynamics is almost neutral. 
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Appendix: Description of the stochastic simulations 



In this A ppendix 



we bri efly outline the stochastic Monte Carlo (MC) method, using an algorithm due to 



Gillespie (|GillesDid (|l977l )). which is often used to simulate stochastic birth-death processes. A MC simulation 
yields a single realization (or run) of the stochastic dynamics of the population. Clearly, in order to obtain the 
PDF of population sizes, one has to calculate the corresponding histogram by the use of binning. 

For the sake of completeness, the MC-Gillespie algorithm is here briefly illustrated in the case of a well-mixed 
(randomly-mating) population that is of direct relevance for our purposes. As before, we consider birth and 
death rates respectively given by T,^ and . Starting with hq mutant alleles, the following loop has to be 
repeated until the number of mutant alleles reaches the absorbing state: 
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• Calculate the probabilities for birth and death in the next time step. The probability that birth occurs 
is Pb — /[T^ + T^], while the probability of death is = 1 ~ -Pfc, where n is the current number of 
mutants A'. 

• Update the physical time of the next step. This time is drawn from an exponential distribution with mean 
that equals the inverse of the sum of the birth and death reaction rates. That is, the probability that a 
time At has elapsed since the last step satisfies: P{M) = (l/r)e-'^*/'^, where T = [T+ + T-]-\ 

• Update number of mutants. Choose a random number c between and 1. If c < Pi,, birth occurs and n 
increases by 1; otherwise death occurs and n decreases by 1. 

• If the updated number of mutants n reaches one of the absorbing states, exit the loop. Otherwise go back 
to the first step. 

This algorithm generates a single erratic trajectory that mirrors the stochastic dynamics of the population. The 
time in these simulations is the real physical time, and therefore, averaging over the fixation time of all these 
trajectories yields the MFT of the process. 

It is worth n oticing that the both the above simulation algorithm and the pseudo-sampling method used in 



( Kimural (jl980l )) keep track of the current number of mutants n{t) by drawing a random number at each time 
step and by sequentially updating n{t). However, while in our simulations n(t) is updated by ±1 at each time 
step according to the current probability to undergo birth or death, Kimura's algorithm updates n{t) by adding 
to it a random number with mean zero and variance that coincides with that of the underlying diffusion process. 
Hence, while our simulations exactly mirror the (full) underlying evolutionary stochastic process (whose PDF 
obeys the master equation (O), Kimura's pseudo-sampling method replicates the predictions of the diffusion 
approximation (|6]) and is therefore expected to be accurate only in the limit of weak selection intensity. 
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